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Abstract 

In this paper we apply evolutionary optimization techniques to compute optimal rule-based 
trading strategies based on financial sentiment data. The sentiment data was extracted from 
the social media service StockTwits to accommodate the level of bullishness or bearishness 
of the online trading community towards certain stocks. Numerical results for all stocks 
from the Dow Jones Industrial Average (DJIA) index are presented and a comparison to 
classical risk-return portfolio selection is provided. 

Keywords: Evolutionary optimization, sentiment analysis, technical trading, portfolio op¬ 
timization 


1 Introduction 

In this paper we apply evolutionary optimization techniques to compute optimal rule-based 
trading strategies based on financial sentiment data. The number of application areas in the 
field of sentiment analysis is huge, see especially m for a comprehensive overview. The field of 
Finance attracted research on how to use specific financial sentiment data to find or optimize 
investment opportunities and strategies, see e.g. 13 : m , and [ 20 ] . 

This paper is organized as follows. Section [2] describes the financial sentiment data used for 
the evolutionary approach to optimize trading strategies and portfolios. Section [3] presents an 
evolutionary optimization algorithm to create optimal trading strategies using financial senti¬ 
ment data and how to build a portfolio using single-asset trading strategies. Section [4] contains 
numerical results obtained with the presented algorithm and a comparison to classical risk-return 
portfolio optimization strategies as proposed by HU using stock market data from all stocks in 
the Dow Jones Industrial Average (DJIA) index. Section [5] concludes the paper. 


2 Financial Sentiments 

We apply financial sentiment data created by PsychSignafi) The PsychSignal technology utilizes 
the wisdom of crowds in order to extract meaningful analysis, which is not achievable through the 
study of single individuals, see [T2] for a general introduction to measurement of psychological 
states through verbal behavior. Let a group of individuals together be a crowd. Not all crowds 
are wise, however four elements have been identified, which are required to form a wise crowd: 
diversity of opinion, independence, decentralization and aggregation as proposed by [21] • These 

1 http: //www.psychsignal.com/ 
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Table 1: PsychSignal.com StockTwits sentiment data format per asset. 
Variable Content 


Date Day of the analyzed data. 

/bull Each message’s language strength of bullishness on a 0-4 scale, 

/bear Each message’s language strength of bearishness on a 0-4 scale. 

«buii Total count of bullish sentiment messages. 

n bear Total count of bearish sentiment messages. 
n total Total number of messages. 


four elements are sometimes present in some forms of social media platforms, e.g. in the finan¬ 
cial community StockTwits, from which the crowd wisdom used for the evolutionary approach 
described in this paper is derived. 

Emotions are regarded as being unique to individual persons and occurring over brief mo¬ 
ments in time. Let a mood be a set of emotions together. In order to quantify the collective 
mood of a crowd, distinct emotions of individual members within the crowd must be quantified. 
Subsequently, individual emotions can be aggregated to form a collective crowd mood. PsychSig- 
nals’ Natural Language Processing Engine is tuned to the social media language of individual 
traders and investors based on the general findings of e.g. [2] and of [22] for the financial domain. 
The engine further targets and extracts emotions and attitudes in that distinct language and 
categorizes as well as quantifies these emotions from text. The methodology is based on the 
linguistic inquiry and word count (LIWC) project, which is available publichj^J See also [IB] for 
a description of an algorithm on how to generate such a semantic lexicon for financial sentiment 
data directly. 

The main idea is to assign a degree of bullishness or bearishness on stocks depending on the 
messages, which are sent through StockTwit^J which utilizes Twitter’s application programming 
interface (API) to integrate StockTwits as a social media platform of market news, sentiment and 
stock-picking tools. StockTwits utilized so called cashtags with the stock ticker symbol, similar 
to the Twitter hashtag , as a way of indexing people’s thoughts and ideas about companies and 
their respective stocks. The available sentiment data format is described in Tab. [Tj The data 
was obtained through QuandQ where PsychSignal’s sentiment data for stocks can be accessed 
easily. 

Both intensities /bull and /bear are measured on a real-valued scale from 0 to 4, where 0 
means no bullish/bearish sentiment and 4 the strongest bullish/bearish sentiment. We normalize 
these values to 1 by diving the respective value by 4 and obtain the variables *b u ii and *bear- 
Furthermore, we create two relative variables for the number of bullish and bearish messages, i.e. 
rbuii = ffouii/Atotai as well as rbear = nbear/^totab such that we end up in the final data format 
we are going to use for subsequent analysis. See Tab. [2] for an example of the stock with the 
ticker symbol BA (The Boeing Company). 

3 Evolutionary Investment Strategy Generation 

We aim at creating an evolutionary optimization approach to generate optimal trading strategies 
for single stocks based on the sentiment analysis data described above. Evolutionary and Ge¬ 
netic Programming techniques have been applied to various financial problems successfully. See 


2 http://www.liwc.net/ 

3 http:/ /www.stocktwits.com/ 
4 http://www.quandl.com/ 
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Table 2: Sentiment values for stock BA starting at the first trading days in 2011. 



^bull 

^bear 

D)uii 

^"bear 

^-total 

2011-01-03 

0.59 

0 

0.50 

0 

4 

2011-01-04 

0 

0 

0 

0 

1 

2011-01-05 

0 

0.11 

0 

1 

1 

2011-01-06 

0.61 

0 

0.25 

0 

4 

2011-01-07 

0.52 

0 

0.17 

0 

6 

2011-01-11 

0.67 

0 

1 

0 
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especially the series of books on Natural Computing in Finance for more examples, i.e. 0 , 0 , 
and [7]. Generating automatic trading rules has been a core topic in this domain, see especially 
0, IE 0, HE and the references therein. 

One main technique in the field of meta-heuristics and technical trading is to let the optimizer 
generate optimal investment rules given a set of technical indicators. However, instead of using a 
variety of technical indicators for generating an optimal trading rule, we use the above described 
financial sentiment data to create investment rules. Thereby we start by using a simplified rule-set 
approach, whereby the rules are generated by a special genotype encoding. Furthermore, as we 
are considering to create a portfolio allocation out of the single asset strategies and additionally 
focus on stocks only, we do not allow for shorting assets, i.e. the decision is whether to enter or 
exit a long position on a daily basis. The rule is based on the respective sentiment values, such 
that this basic rule-set can be defined as shown in Eq. 0- 

[IF(ibull > vi)] 6i [AND] bi& . b2 [lF(r bu ii > u 2 )] h2 THEN long position. 

[lF(z b ear > *> 3 )] &3 [AND] b3&bj [lF(r bear > u 4 )] &4 THEN exit position. 

Each chromosome within the evolutionary optimization process consists of the values 

(h,b 2 ,b 3 ,h,v 1) V2,V3 > v 4 ), 

where the b values are binary encoded (0, 1) and the v values are real values between 0 and 1. The 
b values indicate whether the respective part of the rule notated in square brackets is included 
(1) or not (0), while the v values represent the concrete values within the conditions. Consider 
the following example: the (randomly chosen) chromosome (0,1,1,1,0.4,0.3,0.5,0.2) results in the 
rule-set shown in Eq. Q. 


IF (r bu n > 0.3) THEN long position. , , 

IF (* be ar A 0.5) AND IF (r bear > 0.2) THEN exit position. 

In this special case, the sum of b\ and b 2 as well as b 3 and 64 must be greater or equal to 1, 
to have at least one condition for entering and leaving the long position. We end up with nine 
different possible assignments for b. A repair operator has to be applied after each evolutionary 
operation, which may distort this structure. 

The evaluation of the chromosomes is such that the respective trading strategy is tested 
on the in-sample testing set of length T, i.e. we obtain a series of returns rq ,... ,vt for each 
chromosome, which can be evaluated with different financial metrics. The following strategy 
performance characteristics are considered: 

• The cumulative return r, and the standard deviation a . 
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Table 3: Statistical summary of sentiment values for stock BA 2010-2014. 
Minimum First Quantile Median Mean Third Quantile Maximum 


i buU 0 0 0.3821 0.2987 0.5050 0.8250 

i bear 0 0 0 0.1763 0.3887 0.86 


• The maximum drawdown d , and the Value-at-Risk v a (a = 0.05), as well as 

• the ratio s of expected return divided by the standard deviation, which is based on the 
Sharpe-ratio proposed by [19]. 

We use simple mutation operators for new populations because the chromosome encoding 
of the investment rule described above is short, i.e. contains only eight genes. The following 
mutation operators are applied: 

• b binary flip: One randomly selected gene of the binary b part is 0 — 1 flipped. The 
resulting chromosome needs to be repaired with the repair operator, which itself determines 
randomly, which of the two possibilities is set to 1 if necessary. 

• v random mutation: One randomly selected gene of the binary v part is replaced by a 
uniform random variable between 0 and 1. 

• v mutation divided in half: One randomly selected gene of the binary v part is divided in 
half. The rationale of this operation is that the intensities of bullishness and bearishness 
are often small, see e.g. Tab. [3] for the statistics of the sentiment values for a selected stock. 

Besides these operators, elitist selection is applied as well as a number of random additions 
will be added to each new population. The structure of the algorithm is a general genetic 
algorithm, see e.g. Q~] for a description of this class of meta-heuristics. 

The analysis above is based on single assets. To compose a portfolio out of the single invest¬ 
ment strategies, the resulting portfolio will be created as an equally weighted representation of 
all assets, which are currently selected to be in a long position by its respective trading strategy 
for each day. 

4 Numerical Results 

In this section we begin with a description of the data used to compute numerical results in 
Section |4.1| Section |4.2| summarizes the in-sample and out-of-sample results of the evolutionary 
sentiment trading strategy. A short overview of classical risk-return portfolio optimization is 
given in Section |4.3| and finally a performance comparison is presented in Section |4.4| Everything 
was implemented using the statistical computing language R [181 . 

4.1 Data 

We use data from all stocks from the Dow Jones Industrial Average (DJIA) index using the 
composition of September 20, 2013, i.e. using the stocks with the ticker symbols AXP, BA, CAT, 
CSCO, CVX, DD, DIS, GE, GS, HD, IBM, INTC, JNJ, JPM, KO, MCD, MMM, MRK, MSFT, 
NKE, PFE, PG, T, TRV, UNH, UTX, V, VZ, WMT, XOM. 

Training data is taken from the beginning of 2010 until the end of 2013. The out-of-sample 
tests are applied to data from the year 2014. 
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4.2 Results of the Evolutionary Optimization 

For each stock, the optimal strategy was computed. The evolutionary parameters were set to be 
as follows: 

• The initial population size has been set to 100, and each new population 

• contains the 10 best chromosomes of the previous population (elitist selection), as well as 

• 20 of each of the three mutation operators described above, and 

• 10 random chromosomes, such that the population size is 80. 

For evaluation purposes, the parameter s will be maximized. Of course, the system is flexible 
to use any other risk metric or a combination of metrics. See Tab. [4]for the in-sample performance 
results comparing a long-only buy-and-hold strategy of each asset compared to the trading strat¬ 
egy of the best respective strategy, e.g. the best strategy for AXP is (1,1,1, 0, 0.44,0.41, 0.41, 0.17) 
and for BA (1, 0,1, 0, 0.41, 0.37,0.5, 0.41), while for CAT it is (0,1,1,1, 0.195,0.34,0.02, 0.24) to 
give an impression of single strategy results. The cumulative return performance r is raised 
(sometimes significantly) for almost all assets except for MCD, UTX, V. However, in those three 
cases the decrease in profit is low. The standard deviation a is lower (i.e. better) in all cases, 
which was expected as the algorithm leaves the long-position for a certain time, such that the 
standard deviation clearly has to decrease. The Sharpe-ratio like metric s is better for all assets 
but DIS, JNJ, UTX, XOM. Again, the loss in all four cases is low compared to the gain of the 
other positions. In summary, the in-sample results show that the fitting of the algorithm works 
very well. 

4.3 Classical Portfolio Optimization 

To compare the performance of the portfolio created with single asset investment strategies 
based on financial sentiments with a standard approach to portfolio optimization, we construct a 
portfolio using classical risk-return portfolio selection techniques. E3 pioneered the idea of risk- 
return optimal portfolios using the standard deviation of the portfolios profit and loss function 
as risk measure. In this case, the optimal portfolio x is computed by solving the quadratic 
optimization problem shown in Eq. [3| The investor needs to estimate a vector of expected 
returns r of the assets under consideration as well as the covariance matrix C. Finally the 
minimum return target /i has to be defined. Any standard quadratic programming solver can be 
used to solve this problem numerically. 


minimize x T Cx 
subject to r x x > fj, 

YY x = i 


(3) 


In addition, we also compare the performance to the 1-over-N portfolio, which equally weights 
every asset under consideration. It has been shown that there are cases, where this simple strategy 
outperforms clever optimization strategies, see e.g. [10]. 

4.4 Performance Comparison 

The asset composition of the optimal Markowitz portfolio is shown in Tab. [5]- only eight out of 
the 30 assets are selected. The underlying covariance matrix was estimated from daily returns 
of the training data, i.e. using historical returns from the beginning of 2010 until the end of 
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Table 4: Single stock in-sample results of the evolutionary optimization. 
Long-only stock Trading strategy 



r 

a 

S 

r 

a 

S 

AXP 

1.223 

0.016 

0.057 

1.574 

0.013 

0.068 

BA 

1.450 

0.016 

0.063 

1.771 

0.013 

0.070 

CAT 

0.575 

0.018 

0.034 

0.978 

0.014 

0.043 

CSCO 

-0.070 

0.019 

0.006 

1.246 

0.011 

0.061 

CVX 

0.597 

0.013 

0.042 

0.723 

0.012 

0.044 

DD 

0.912 

0.015 

0.051 

1.177 

0.011 

0.070 

DIS 

1.351 

0.015 

0.065 

1.522 

0.013 

0.065 

GE 

0.842 

0.015 

0.048 

1.054 

0.013 

0.050 

GS 

0.042 

0.019 

0.012 

0.662 

0.010 

0.041 

HD 

1.825 

0.014 

0.082 

2.209 

0.012 

0.087 

IBM 

0.430 

0.012 

0.036 

0.711 

0.005 

0.083 

INTC 

0.249 

0.015 

0.022 

0.600 

0.009 

0.043 

JNJ 

0.415 

0.008 

0.045 

0.415 

0.008 

0.041 

JPM 

0.399 

0.019 

0.027 

0.873 

0.016 

0.037 

KO 

-0.277 

0.019 

-0.004 

0.363 

0.006 

0.042 

MCD 

0.549 

0.009 

0.052 

0.515 

0.006 

0.060 

MMM 

0.688 

0.013 

0.048 

0.795 

0.012 

0.051 

MRK 

0.359 

0.012 

0.032 

0.516 

0.010 

0.041 

MSFT 

0.222 

0.014 

0.021 

0.509 

0.012 

0.031 

NKE 

0.190 

0.022 

0.022 

0.511 

0.019 

0.030 

PFE 

0.677 

0.012 

0.048 

0.805 

0.011 

0.049 

PG 

0.332 

0.009 

0.036 

0.406 

0.007 

0.046 

T 

0.238 

0.010 

0.026 

0.397 

0.007 

0.040 

TRV 

0.805 

0.012 

0.053 

0.946 

0.011 

0.064 

UNH 

1.400 

0.016 

0.063 

2.443 

0.013 

0.091 

UTX 

0.621 

0.013 

0.042 

0.563 

0.011 

0.042 

V 

1.530 

0.017 

0.062 

1.494 

0.010 

0.072 

vz 

0.471 

0.011 

0.041 

0.572 

0.009 

0.045 

WMT 

0.464 

0.009 

0.045 

0.654 

0.007 

0.058 

XOM 

0.473 

0.012 

0.039 

0.506 

0.010 

0.036 
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Table 5: Optimal Markowitz portfolio using daily return data from 2010-2013. 
Ticker symbol HD JNJ MOD PG UNH V VZ WMT 

Portfolio weight [%] 10.26 16.69 22.67 11.92 6.41 4.22 7.56 20.27 


Table 6: Selected risk metrics for the different out-of-sample tests. 


Markowitz 1-over-N Evolutionary 


Semi Deviation 

0.0042 

0.0048 

0.0038 

Downside Deviation (Rf=0%) 

0.0040 

0.0047 

0.0037 

Maximum Drawdown 

0.0631 

0.0687 

0.0549 

Historical VaR (95%) 

-0.0092 

-0.0105 

-0.0081 

Historical ES (95%) 

-0.0125 

-0.0157 

-0.0124 


2013. This portfolio is used as a buy-and-hold portfolio over the year 2014. This out-of-sample 
performance is shown in Fig. H While the performance of the 1-over-N portfolio is not shown 
graphically, Fig. [2] depicts the performance of a portfolio, which is created by equally weighting 
all single asset trading strategies computed by the evolutionary optimization algorithm based on 
financial sentiment data into one portfolio. To get a better impression of the differences see Tab. 
[6j where some important risk metrics are summarized for all three strategies. The evolutionary 
trading portfolio exhibits better risk properties than both other portfolios in all five metrics. 
Especially important is the reduction of the maximum drawdown, which is of importance to 
asset managers nowadays, because investors are increasingly looking to this metric if they are 
searching for secure portfolios. 

5 Conclusion 

In this paper an evolutionary optimization approach to compute optimal rule-based trading 
strategies based on financial sentiment data has been developed. It can be shown that a port¬ 
folio composed out of the single trading strategies outperforms classical risk-return portfolio 
optimization approaches in this setting. The next step is to include transaction costs to see how 
this active evolutionary strategy loses performance when transaction costs are considered. Future 
extensions include extensive numerical studies on other indices as well as using and comparing 
different evaluation risk metrics or a combination of metrics. One may also consider to create a 
more flexible rule-generating algorithm e.g. by using genetic programming. Finally, to achieve 
an even better out-of-sample performance the recalibrating of the trading strategy can be done 
using a rolling horizon approach every month. 
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Figure 2: Out-of-sample performance of an equally weighted portfolio out of the evolutionary 
sentiment trading strategies in 2014. 
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